SPEECH


               Most Macs can synthesise speech if the necessary system software is installed.
               A PowerMac with extra software can even respond to your spoken commands!


Synthesised Speech
qqqqqqqqqqqqqqq

Synthesised speech on the Mac is called text-to-speech (TTS). Any text, such as that in a document or dialog can be converted into speech using a software synthesiser. To use TTS you’ll need a application to works with it — if you can’t find anything else try SimpleText!


System Software for Speech Synthesis
wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww

TTS needs special software in your System Folder. The Easy Install option in the System Installer provides it automatically — if not, you’ll have to use Custom Install. The Mac Plus doesn’t support TTS and the presence of its software can cause a freeze during startup
— hold down Space during startup to disable it with Extensions Manager.

To use TTS the following files must be in the System Folder:-

               ø  In the Extensions folder:
                       Speech Manager
                       MacinTalk 2 and/or Macintalk 3 and/or MacIntalk Pro synthesiser
                       Voices folder with voices matching the version of Macintalk

               æ  In the Control Panels folder:
                       Speech control panel


Speech Synthesisers                                                                                                

The MacinTalk 2 and 3 speech synthesisers are included in the system software but the Macintalk Pro synthesiser is a separate product.

MacinTalk 2 is suitable for any Mac with a 68000, 68020 or 68030 processor running at under 33 MHz. Even if you use Macintalk 3 you’ll still need MacinTalk 2 for MacinTalk 2 voices! These voices, such as Boris and RoboVox, only use a small amount of RAM. Macintalk 3 is an improved version for 68030 Macs running at 33 MHz or more.

MacIntalk Pro is for 68040 Macs or PowerMacs only. It’s voices can use up to 5 M of RAM each. Since they’re mainly contained in system memory you shouldn’t need to keep adjusting the memory assigned to a speech application — but the memory must still be available! Compressed voices require much less memory. The Gala Tea voices supplied with PlainTalk (see below) include TTS Male, Agnes, Bruce and Victoria — their RAM requirements are similar to the MacinTalk Pro voices.

Ú The varied upper-case letters in these file names aren’t a mistake!


The Voices Folder                                                                                                   

The voice files the Voices folder can only be used by the appropriate Macintalk synthesiser. All available voices appear in the Speech control panel and speech application menus.

Ú A voice file will only work if its matching synthesiser is available. The voice file 
       type is shown by the number on its icon.


The Speech Control Panel                                                                                         


               This panel is used to set the defaults for speech synthesis and recognition. You 
               can choose a default Macintalk voice and specify its rate of delivery.  Just click on 
               the loudspeaker box for a sample!


Speech Synthesis in Applications
wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww

With the appropriate system software installed you can use TTS in any application that supports it. For example, SimpleText will read out the contents of a file — you can choose a voice from the Sound menu.


               So To Speak is an application that demonstrates TTS to the full. The voices are 
               selected using a pop-up menu and during speech the pitch and rate of delivery 
               can be adjusted — the speech can be paused and stopped at any point.

Both So To Speak and Speaker (a neat application for reading text files) can use an external file containing a pronunciation dictionary (see below).


Voice Parameters                                                                                                   

Some applications let you modify the parameters of a voice. Here’s the dialog for an application called DictionaryEdit (see below):-


The parameters are:-

               Rate                  Speed of delivery, as in the Speech control panel
               Pitch                 Underlying tone of voice — higher for females!
               Modulation          Inflection, emphasising parts of words or sentences


Speech Synthesis in System Additions
wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww

                 Special control panels or extensions can be used to add extra TTS features to your 
                 Mac. For example, Speak2Me will read out the name of a selected icon in the 
                 Finder whilst SpeakAlert will speak the contents of any alert box.


Pronunciation Dictionaries
wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww

               TTS often pronounces words incorrectly — especially the names of people 
               and places. A pronunciation or speech dictionary can be used to tell a synthesiser 
               how to pronounce these words.

The dictionary describes the pronunciation of each word with phonemes and prosodic controls — each represented by a sequence of characters. Phonemes describe the sound of each syllable whilst prosodic controls add the stress and intonation for natural speech.

If a speech application finds a word in its dictionary, the synthesiser uses the dictionary’s entry instead of the standard conversion rules. A dictionary can substitute the word by an alternative spoken word — but it’s more often used to optimise the pronunciation.

The dictionary comes in the form of a dict resource that’s loaded by the speech application prior to use. This resource can be in the application itself or in a file of Type dict or rsrc.

Most dictionaries don’t support abbreviated entries and only allow two fields per entry — the first is the text , the second is the phoneme. Both entries mustn’t contain more than 256 characters. The list inside a dictionary isn’t always sorted into any particular order!


Inside a Pronunciation Dictionary                                                                               


               DictionaryEdit is an application for modifying dicts. Several dictionaries can be 
               open at once, but you can only edit one dict at a time within a single file. 


You can cut and copy (or drag) entries between dictionaries or fill a dictionary by converting its text into phonemes.

When you select Open Dictionary… you’re presented with this window:-


The left-hand box contains each text entry in the dictionary. The lower box to the right contains a list of the phonemes and prosodic controls that make up the word samuel.

When you click on the left-hand icon the selected text is automatically converted into its component parts and placed in the phoneme list. You can then edit the phonemes or add prosodic controls as necessary.

As you work you can listen to the results by clicking on the right-hand icon. You can also open and edit text files for to try out your new dictionaries.


Phonemes                                                                                                              

Phonemes are components of speech that are represented by case sensitive symbols.  For example, the vowel in the words bout and how are both represented by the AW phoneme — even though they’re spelt differently!

The full list of phonemes is:-

               AE		bat          EY		bait         AO		caught         AX		about
               IY		meet         EH  bet          IH		bit            AY		bite
               IX		closes       AA  cot          UW	 boot           UH		book
               UX		mud          OW	 boat         AW	 bout           OY		boy
               b		 bin          C		 chin         d		 dark           D	 	those
               f		 fake         g		 gain         h		 hat            J	 	gin
               k		 kin          l	 	limb         m		 mat            n 		knock
               N		 tang         p	 	pin          r		 ran            s 		satin
               S		 shin         t	 	tin          T		 thin           v 		van
               w		 wet          y	 	yank         z		 zen            Z 		genre

               %		silence       @		breath intake

For example, the word application is AEplIHkEYSAXn.


Prosodic Codes                                                                                                      

Prosodic codes (or prosody symbols) are used with phonemes to fine tune the pronunciation.

They include:-

               Symbol      Meaning

                 1         Primary Stress
                 2         Secondary Stress
                 =         Syllable Mark
                 ~         Unstressed
                 _         Normal Stress
                 +         Emphatic Stress
                 /         Pitch Rise
                 \         Pitch Fall
                 >         Lengthen phoneme
                 <         Shorten phoneme
                 .         Sentence final fall
                 ?         Sentence final rise
                 !         Sentence final sharp fall
                 …         Clause final level
                 ,         Continuation rise
                 ;         Continuation rise
                 :         Clause final level
                 (         Start reduced range
                 )         End reduced range
                 “         Varies
                 ‘         Varies
                 ”         Varies
                 ’         Varies
                 -         Clause final level
                 &         Forces no silence between phonemes


The stress codes	(1 and 2) indicate which syllables should be emphasised.
For example, the word anticipation could be in the form:-

               AEnt2IHsIXp1EYSAXn

The syllable codes (=)	break the word into syllables:-

               AEn=t2IH=sIX=p1EY=SAXn

The word prominence codes (~, _ and +) indicates the need to stress a particular word. The true prosodic codes (/, \, > and <) can be used to modify a phoneme to make a word sound more natural.


The Dictionary Header                                                                                            

Some applications let you edit the dictionary’s header — most users won’t need to modify it. In DictionaryEdit you’re presented with this window:-


The parameters are:-

               Parameter                    Usual contents                                                                   

               Atom type                         This is dict for a standard dictionary
               Format version                  This is 1 for a pronunciation dictionary
               Script                                   For the Roman system this is 0
               Language                            Code for English is 0
               Region                                 Code for the USA is is 0; for the UK it’s 2
               Date Last Modified           Last date the dictionary was modified.
               Dict size                               Total byte length of the dictionary

The size indicates how much memory the dictionary will use when loaded.


Speech Recognition
qqqqqqqqqqqqqqq


The PlainTalk software package lets you speak instructions to your Mac — but only if it’s an AV Mac or PowerMac. If you install MacIntalk Pro as well it will reply in good voice!

Ú For full details see the information supplied with the PlainTalk package.

To use speech recognition you’ll need a PlainTalk microphone. The circular microphone supplied with many Macs isn’t suitable — but the one built into an AV monitor is! Many Macs can’t accept a PlainTalk microphone!

ù  See the Sound chapter for more about microphones and sound inputs.

               You’ll also need the Speakable Items software, the Speech control panel and the 
               Speech Recognition extension. You should also install the software for TTS!


Other necessary extension files include:

               Extension                                 Function                                                          

               Speech Macro Editor              Defines Mac instructions
               My Speech Macros                  Contains your instructions
               SR Monitor                               Monitors and interprets speech
               System Speech Rules                For different voices and speech dialects

With recognition enabled, a list of spoken commands appear in the Speakable Items folder in the  menu. The Listening option in the Speech control panel lets you pick a key combination to switch the Mac into listening mode — it then accepts commands.


©Ray White. All Rights Reserved 1997